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Streaming data incorporates dynamicity due to a nonstationary environment 
where data samples may endure class imbalance and change in data 
distribution over the period causing concept drifts. In real-life applications 
learning in dynamic data streams, is vitally important and challenging. A 
combined solution to adapt to class imbalance and concept drifts in dynamic 
data streams is rarely addressed by researchers. With this motivation, the 
current communication presents the online ensemble model smart pools of 
data with ensembles for class imbalance adaptive learning (SPECIAL) to 
learn in skewed and drifting data streams. It employs an ageing-based 
G-mean maximization strategy to adapt to dynamicity in data streams. It 
employs smart data-pools with the local expertise ensemble to classify 
samples lying in the same data-pool. The empirical and statistical study on 
different evaluation metrics exhibits that SPECIAL is more adaptive to class 
imbalanced dynamic data streams than the state-of-the-art algorithms. 
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1. INTRODUCTION 

The current era of information technology demands predictive models for numerous real-life 
applications like social media sentiments extraction [1], financial risk prediction [2], network intrusion 
identification [3], and so on. Such applications produce a continuous stream of endless data with high volume 
and speed [4]-[6]. Data streams possess dynamicity due to the varying distribution of data in them. The 
change in data distribution is referred to as “concept drift” [5], [7]. The unequal number of class instances of 
a data stream cause skewness [8]-[9]. Learning in nonstationary data streams with the class imbalance and 
drifting concepts is a challenging task in computational intelligence [10]. 

Learning in nonstationary data streams has attracted many researchers [7]. Ensemble learning 
employs a combination of single classifiers to augment the generalization capability and is more suitable to 
handle dynamicity in data streams [4], [5], [11]-[13]. Block-based ensembles like accuracy weighted 
ensemble (AWE) [14], accuracy updated ensemble (AUE) [15], dynamic weighted majority for imbalance 
learning (DWMIL) [16], adaptive chunk-based dynamic weighted majority (ACDWM) [17] tackle to concept 
drifts by inserting a new learner trained on a new block of data. Online ensembles like ADWIN Bagging 
(BagADWIN) [18], online accuracy updated ensemble (OAUE) [19] train the model on each incoming data 
sample avoiding waiting for a block or whole training dataset. The implicit adaptation to concept drifts in 
data streams leads to passive learning algorithms like dynamic weighted majority (DWM) [20], anticipative 
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dynamic adaptation to concept change (ADACC) [21], dynamic classifier selection (DCS) [22]. However, 
explicit drift detection leads to an active learning approach as narrated in [23]-[26]. 

A rich literature on the class imbalance problem is available [8], [27] that describes three broad 
approaches to handle skewness in data: 1) sampling-based, ii) algorithm-based, and iii) cost-sensitive. Most of 
the researchers have handled the problem of class imbalance and dynamicity in data streams separately. The 
combined solution to both problems is presented by comparatively fewer researchers [28]-[29]. 

Through this correspondence, we provide a novel combined solution to the issue of class imbalance 
and concept drifts in dynamic data streams. We propose an online ensemble classifier smart pools of data 
with ensembles for class imbalance adaptive learning (SPECIAL) to classify binary dynamic data streams. 
The contributions of the proposed work are: 

— It provides a passive drift detection model using an online ensemble with a test-then-train approach. 

— It employs an ageing-based strategy to adapt to the dynamicity in data. 

— It deals with the class imbalance in streaming data with the objective of G-mean maximization. 

— It investigates empirical and statistical test results and compares the performance of the proposed 
algorithm SPECIAL with state-of-the-art ensembles on different performance measures using various 
real and synthetic benchmark datasets. 


2. RESEARCH METHOD 

The current section describes the problem of this research and the proposed methodology to solve it. 
The proposed methodology elaborates the training and testing phase of the proposed algorithm SPECIAL. 
The section further presents the experimental framework to test the performance of the algorithm SPECIAL. 


2.1. Problem formulation 

The data stream DS arrives as a sequence of massive data samples {d), d2, d3, ...} where d, is a data 
sample in the m-dimensional feature space arrived at time step t = /, 2, .... Associated with each of the data 
sample d; there is a class label c; €C={ci, c2, ..., cL} where L is the number of class labels. As our work 
focuses on a binary classification problem, we consider C={0,/} where ‘0’ defines a negative class, and ‘/’ 
defines a positive class. Let DS° be the set of negative data samples and DS’ be the set of positive data 
samples. We consider class imbalance in the data stream where |DS°| >> |DS‘|. A concept in the data stream 
is defined by a joint distribution PDS, C). At time step f, it generates a tuple (d,, c;). The dynamicity in the 
data stream may cause concept drift where its joint distribution changes over the period i.e. P;.7(/DS, C) # 
PDS, C) [7]. As an online learner progresses with the recently received data samples, it is suitable for 
handling the sequential incoming flow of data samples of the data streams [7], [30]. Hence, for the prediction 
of a class c; of an input data sample d; of a dynamic data stream, we propose an online classifier 
f: DSC. 


2.2. Proposed methodology 

Through this communication, we present an online adaptive ensemble SPECIAL. It is an ensemble 
of ensembles E={e), e2, ..., es} where s is the number of sub ensembles to deal with dynamically drifting 
imbalanced data streams. We focus on the local expertise of each of the sub ensembles on the area of the 
m-dimensional feature space where the recent data sample has appeared. Thus, it prefers the predictions by 
the sub ensemble which exhibited better classification accuracy in this area. 

The data samples in the same area of the m-dimensional feature space are mapped to a single data- 
pool which represents the m-dimensional hypersphere. The data samples in a data-pool point to their 
expertise sub ensemble e; where i € {J, 2, ..., s} which gives the lowest classification error for the data 
samples in that data-pool. Each data-pool p; jE{1, 2, ..., z} where z is the number of existing data-pools is 
characterized by the following metadata at time step f: 

a. A pool-master (PM ): It is a representative data sample of the j data-pool based on the minimum 
prediction error in its classification. At time step ¢, a tie with the same prediction errors, if any, is 
resolved by selecting the most recently mapped data sample to that pool as its pool-master. 

b. Pool size (PS; ): It is the number of data samples mapped to the j" data-pool by the time step f. 


c. Average prediction error (APE? ): It is the average prediction error incurred in the classification of all 
data samples in the j'" data-pool by the time step f. 

d. Prediction factor (PF; ): It is given as (PF y=1- (APE; ). The higher value of (PF ) indicates that 
the expertise sub ensemble associated with the j" data-pool shows better classification results. 


e. Representative factor (RF! ): It is a ratio of pool size of the j data-pool to the total number of data 
samples received by the time step ¢. 
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f. Weight factor (WF/): It is a ratio of prediction factor (PF?) to representative factor (RF?) of the j 
data-pool by the time step t. The pool with a higher weight factor is preferred as it gives better 
prediction with a lesser number of data samples mapped to it. And, as it has a smaller pool-size we can 
map more suitable data samples to it. 

The proposed SPECIAL classifier employs a test-then-train approach. Through this approach, it first 
predicts a class of each sample (d;, c;) received at time step ¢ and then uses the same for the training of the 
model. The detailed description of the testing phase and training phase of the learning model is given below. 


2.2.1. Model testing phase 
For each incoming data sample at time ¢, the proposed algorithm identifies its K nearest pools. It 
assigns the highest priority to one of these K nearest pool-masters if it is the most recently defined 


pool-master and has the highest weight factor (WF; ). Then the expertise learner associated with the selected 
highest priority j data-pool classifies the newly arrived data sample. On correct classification, the new data 
sample is mapped to the selected data-pool, and metadata of that data-pool is updated. 

At the start when no data-pool is created and no pool-master is available, the first incoming data 
sample itself forms a new data-pool and becomes a pool-master. It is tested on each sub ensemble e; € E 
where i € {J, 2, ..., s}. This newly formed data-pool points to the sub ensemble with the least prediction 
error. Accordingly, its metadata is updated. The prediction results are empirically and statistically examined 
on various evaluation metrics. The performance of the SPECIAL algorithm is compared with other state-of- 
the-art algorithms. Algorithm 1 describes the testing phase of the SPECIAL model. 


Algorithm 1: SPECIAL model testing 
Input: (dad, ct) is an incoming data instance at time t={1, 2, ..} of data stream DS; E={e1, 
€2, .., @s}is an ensemble of s ensembles; FP¥{ pi, poz, .., pz}is a collection of existing Zz 
data-pools. 
Output: G: A predicted class of data instance dad. 

ds P_near=Search_K-nearest-pools (dr, K, P); 


2 if (P_near==6) { 

3 @=Get_Prediction (d, E); 

4 pi=Create New Data-pool (d:); Update Metadata (pi) ;} 

2 else { 

6 p=Select_Highest-prority Pool (P_near); 

Ass xe=Get_ Associated Expertise Subensemble(p) ; 

8 @=Get_Prediction (d:, xe); 

9 if(¢ ==¢,) { 

10 Map_to_Data-pool (d:, p); Update Metadata (p);}} 

13 Evaluate the performance of the SPECIAL model on various evaluation metrics using 


(é,); 


2.2.2. Model training phase 

The proposed SPECIAL classifier employs online learning as presented by Oza N. [31]. Generally, 
the bootstrapping samples follow a normal distribution. Due to the unavailability of all training instances at 
the start and a huge volume of incoming streaming data, the normal distribution of bootstrapping samples is 
approximated by a Poisson (1) distribution, if the number of training instances n—. 

The SPECIAL algorithm adapts to dynamicity in the data stream by assigning more weightage to 
the recently arrived data instances. It incorporates two predefined ageing metrics: 1) data-ageing metric 
f, (0<B<1) and 2) sensitivity-ageing metric y, (0<y<J) to give more weightage to the latest data and 
sensitivity (i.e. true positive rate). The function f(c;,c,) returns 1 if the class label c, of incoming data 
instance is c,; and 0 otherwise, where L={0,/}. The function g(c;,¢,) returns | if the class label of incoming 
data instance is correctly predicted and 0 otherwise. Let (DS!) where L={0,/} be the metrics defining the 
percentages of negative class (if L=0) or positive class (if L=/) by time step t. Let (Se,) be the metric 
defining sensitivity by time step t. The ageing-based computations of (DS#) where L={0,/}and (Se,) are 
given by (1) and (2) respectively. Referring to these ageing-based computations shown in (1) and (2) it can 
be noticed that the ageing metrics # and y force the old samples to contribute less to computations of metrics 
of class percentage and sensitivity, respectively. 


DSE = B - DSt_, + (1 — B)- f (cz, €,); where L={0,1} (1) 


Se, =y*Se4+(1-y)- 9c, &) (2) 
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When the difference between class percentages |(DS? — DS})| crosses the predefined threshold ¥, 
the proposed SPECIAL algorithm considers it as a scenario of class imbalance at time step ¢. To handle a 
class imbalance in binary data streams, it focuses on G-mean improvisation as described in [32]. It focuses on 
both positive (minority) and negative (majority) classes and deemphasises negative class only when the class 
imbalance results in a poor true positive rate. Accordingly, when data size n—0o, and there is a class 
imbalance in an incoming data stream, the number of copies B of the positive and negative class samples by 
Poisson (A) approximated bootstrapping at time step t is given by (3). In the training phase, the sub ensembles 
associated with data-pools are trained by using a B number of copies of the data sample (d,, c,). The trained 
model of the SPECIAL classifier at time step f is used to test the unseen incoming data sample (d;+7, c;+1) at 
the next time step ++ /. Algorithm 2 describes the training phase of the SPECIAL model. 


Poisson(Se;); if |(DS? — DS2)| > ¥ and c, = 0 
Ds? 
B~< Poisson pst} if |(DS?2 — DS?)| > Vandc, =1 
t 
Poisson(1); if |(DS?2 — DS})| < ¥ (3) 


Algorithm 2: SPECIAL Model Training 
Input: (dt, ct) is an incoming data instance at time t={1, 2, ..} of data stream DS; 
E={e1, @2, .., @s} is an ensemble of s ensembles; P={ pi, pz, .., pz}is a collection of z 
data-pools; G@ is a predicted class of instance d: in the testing phase; ¥ is the 
imbalance threshold. 
Output: An updated model of SPECIAL. 
Initialize: (DS?2)=(DS})=0; (Se,)=0; 
if(q¢ #Cr) { 

p=Select_Highest-prority Pool (P); 

xe=Get Associated Expertise Subensemble(p) ; 

Y.=Get_Prediction (dt, xe); 

if(¢ ==9) { 

Map_to Data-pool (d:, p); Update Metadata (p); 

else {p_new =Create_ New _Data-pool (dt); Update Metadata (p_new); P=PUp_new; }} 
Update metrics (DSP), (DS) using equation (1) and (Se,) using equation (2); 
if (|(DS2—-DS)|>¥ and cq, ==0) { 

B~Poisson(Se,); Repeat B times training of the ensemble E={e1, e2, .., es}; } 
else if (|(DS2—DS))|> ¥ and c, ==1) { 


: Ds? 7 i 
B~Poisson C=); Repeat B times training of the ensemble E={e1, e2, .., @s}; } 
€ 


AANA OPWN FP 


Ne} 
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else { B~Poisson(1); Repeat B times training of the ensemble E={e1, e2, .., es}; } 


2.3. Experimental framework 

This section describes the experimentation framework used to assess the performance of SPECIAL. 
It provides the details of various datasets used for the experimentation. It also specifies the performance 
metrics used and the necessary experimental setup for the current study. 


2.3.1. Datasets 

A variety of datasets are widely used in the study on dynamic data streams as mentioned in [33]. 
The current study uses six popular binary datasets. Table 1 summarizes the description of these datasets. The 
used benchmark data streams have variations in the number of samples, the number of attributes, imbalance 
ratio, and types of drifts. Electricity pricing, agrawal and rotating hyperplane (RotHyperplane) datasets are 
taken from massive online analysis (MOA) framework [34]. The datasets like weather, SEA and rotating 
checker board with constant drift rate (RotChBoard-CDR) are available in the repository by [35]. 


2.3.2. Performance metrics 

The presented research focuses on the classification of dynamic streams which may possess concept 
drift and skewness. Let TP, FP, TN, and FN be the number of true positive, false positive, true negative, and 
false negative data samples resulting in the binary classification, respectively. For skewed data, the only 
accuracy ((TP +TN)/(TP+TN+FP+FN)) cannot justify the performance of the classifier as it gets 


influenced by the majority samples [7]-[8]. However, G-mean ( (TP/(TP + FN)):(TN/(TN + FP))) 


focuses on the classification of both positive and negative samples as it gives a geometric mean of sensitivity 
and specificity. Hence, in addition to accuracy, the current study investigates the predictive capabilities of the 


SPECIAL using the metrics like G-mean, and Fl-measure (Ge ee oe), 
(TP/TP+FN)+(TP/TP+FP) 
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Table 1. The summary of datasets used 


Dataset Samples Classes _ Attributes Positive Samples Negative Samples _ Type of Drifts 
Weather (R) 18159 2 8 5698 12461 N/A 
Electricity Pricing (R) 45312 2 8 19237 26075 N/A 
SEA (S) 50000 2 3 18581 31419 Real, Abrupt 
Agrawal (S) 100000 2 9 32656 67344 N/A 
RotHyperplane (S) 200000 2 10 99935 100065 Real, Gradual 
RotChBoard-CDR(S) 409600 2 2 204758 204842 Real, Gradual 


(R)-Real, (S)-Synthetic, N/A-It has a drift, but its type is not available. 


2.3.3. Experimental setup 

The proposed SPECIAL algorithm builds an ensemble of ensembles. For the same, it blends five 
seminal ensembles as its base learners. The list of these sub ensembles used to build a learning model of 
SPECIAL is: 

a. Hoeffding tree (HT) [36]: It is an incremental decision tree to learn in a streaming environment. 

b. Dynamic weighted majority (DWM) [20]: It is a chunk-based ensemble that continuously evaluates the 
weights of its base learners considering the prediction results. It replaces the poor-performing base 
learner and updates the ensemble. 

c. ADWIN bagging (BagADWIN) [23]: It is an extension of the online bagging algorithm [31] by 
integrating the ADWIN (ADaptive sliding WINdowing) algorithm [37] in it. 

d. ADWIN boosting (BoostADWIN) [23]: It is an extension of the online boosting algorithm [31] by 
combining the ADWIN (ADaptive sliding WINdowing) algorithm [37]. 

e. Anticipative dynamic adaptation to concept change (ADACC) [21]: It is an incremental ensemble 
capable of handling recurring concept drifts using Kappa statistics. 

The performance of SPECIAL is compared with seven state-of-the-art algorithms used in data 
stream classification. It is compared with its five sub ensembles as mentioned in the above list. In addition to 
that it is also compared with the following two classifiers: 

a. Hierarchical linear four rates (HLFR) [26]: It is an online method for concept drift detection in the 
dynamic data stream. 

b. Adaptive chunk-based dynamic weighted majority (ACDWM) [17]: It is a block-based ensemble 
method for concept drift detection in the streaming environment. 

All algorithms are evaluated using the test-then-train approach. We define both data-ageing metric f 
and sensitivity-ageing metric y as 0.9. Also, the threshold Y used for the detection of class imbalance 
scenario is set to 0.6. In the testing phase, we search for K=20 nearest data-pools. 


3. RESULTS AND DISCUSSION 

This part evaluates the performance of the proposed SPECIAL algorithm on a variety of datasets. It 
presents the empirical results of the experimentation. It also describes the statistical analysis of the current 
work. 


3.1. Empirical results 

We empirically test the performance of SPECIAL on three evaluation metrics: i) accuracy, ii) 
G-mean, and iii) Fl-measure and compare it with seven state-of-the-art algorithms used to learn in the 
streaming environment. Tables 2 to 4 present the experimental results of all algorithms on three metrics. All 
results are given in percentages and the values in the parenthesis indicate the rank of the algorithm when 
tested on a specific dataset. The minimum value of average rank indicates the best performance of the 
algorithm. 

Table 2 presents the accuracy results of all algorithms. SPECIAL gives the highest accuracy on real 
datasets weather and electricity. With the least value of the average rank, it shows the overall best 
performance on accuracy metric. The G-mean results of all algorithms are given in Table 3. Considering the 
overall average rank value on all datasets, SPECIAL provides the best G-mean results. Table 4 summarizes 
the Fl-measure results of all algorithms. SPECIAL is the best performer with the minimum average ranking 
on Fl-measure. Figure | depicts the overall average of ranks of all algorithms. It is noticed that the proposed 
algorithm SPECIAL with the least value of the overall mean of ranks beats all other state-of-the-art 
classifiers. The ageing factors used in SPECIAL gives more significance to the recent data that helps to adapt 
to the latest changes in the incoming data. SPECIAL incorporates the G-mean improvisation strategy with the 
Poisson (A) approximated bootstrapping that focuses on the recalls of both positive and negative classes. 
Also, the employment of the ensemble of locally expertise sub ensembles in SPECIAL alleviates the 
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limitations of an individual classifier and gives improved classification results. Thus, the SPECIAL provides 
better results for adaptive online learning in skewed dynamic data streams than other state-of-the-art learners. 


Table 2. Accuracy percentage and ranking of all algorithms on all datasets 


Dataset Weather _ Electricity SEA Agrawal RotHyperplane RotChBoard-CDR Avg. Rank 
HT 73.43(5) 80.33 (5) 94.1(2) 85.32 (5) 84.09 (4) 60.21 (8) (4.83) 
DWM 70.13(7) 78.98 (6) — 87.55(5) 87.87 (2) 89.86 (1) 71.95 (6) (4.5) 
OzaBagAdwin —-75.01(2) —-83.67(4) —«94.58(1) 87.01 (3) 88.01 (2) 85.37 (3) (2.5) 
OzaBoostAdwin 74.3973) 88.13 (3) _—-88.17(4) 83.45 (7) 76.57 (7) 94.34 (1) (4.17) 
ADACC 73.57(4) 89.66 (2)  82.83(7) 84.78 (6) 82.77 (6) 80.03 (5) (5) 
HLFR 61.61(8)  60.82(8)  61.44(8) 60.17 (8) 55.68 (8) 66.57 (7) (7.83) 
ACDWM 71.09(6) 76.48 (7) — 87.21(6) +=: 94.35 (1) 83.28 (5) 81.64 (4) (4.83) 
SPECIAL 75.08(1) 90.26(1) _93.03(3) _ 86.05 (4) 86.31 (3) 93.1 (2) (2.33) 
Table 3. G-mean percentage and ranking of all algorithms on all datasets 
Dataset Weather _ Electricity SEA Agrawal _ RotHyperplane RotChBoard-CDR Avg. Rank 
HT 63.16(6) 80.03 (5) 93.51(2) 82.66 (5) 84.08 (4) 60 (8) (5) 
DWM 70.45(2) 77.16 (6) —79.03(6) 85.09 (2) 89.86 (1) 71.95 (6) (3.83) 
OzaBagAdwin 60.66(7) 82.45 (4) 94.3(1) 84.54 (3) 88.01 (2) 85.37 (3) (3.33) 
OzaBoostAdwin  67.85(4) 87.73 (3) __—-85.32(5) 81.16 (7) 76.57 (7) 94.34 (1) (4.5) 
ADACC 67.81(5)  89.29(2) 73.59(7) 81.9 (6) 82.77 (6) 80.03 (5) (5.17) 
HLFR 43.74(8) 59.79 (8) 53.79(8) 46.02 (8) 55.67 (8) 66.57 (7) (7.83) 
ACDWM 72.2311) -75.74(7) —-85.86(4) 94.86 (1) 83.28 (5) 81.64 (4) (3.67) 
SPECIAL 68.84(3) 89.81 (1) __91.72(3) 83.64 (4) 86.31 (3) 93.1 (2) (2.67) 
Table 4. Fl-measure percentage and ranking of all algorithms on all datasets 
Dataset Weather _ Electricity SEA Agrawal _ RotHyperplane RotChBoard-CDR Avg. Rank 
HT 52.36 (6)  77.15(5)  91.04(2) 79.04 (5) 84.15 (4) 62.13 (8) (5) 
DWM 59.98 (2) 73.56 (6) 76.69(6) 82.43 (2) 89.86 (1) 71.87 (7) (4) 
OzaBagAdwin 50.45 (7)  79.86(4)  91.85(1) 81.46 (3) 87.98 (2) 85.38 (3) (3.33) 
OzaBoostAdwin 57.59(4) 85.923) 81.22(5) 76.86 (7) 76.58 (7) 94,33 (1) (4.5) 
ADACC 57.29 (5) 87.73 (2) ~—-68.33(7)~—- 78.14 (6) 82.74 (6) 79.95 (6) (5.33) 
HLFR 28.45 (8) 54.25 (8)  42.67(8) 31.41 (8) 55.27 (8) 84.49 (4) (7.33) 
ACDWM 62.14(1) 72.14 (7) ~~ 82.53(4) 91.77 (1) 83.28 (5) 80.51 (5) (3.83) 
SPECIAL 58.85 (3) 88.37(1) __89.21(3) 80.21 (4) 86.28 (3) 93.1 (2) (2.67) 
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3.2. Statistical results 
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Figure 1. Overall average of ranks of all algorithms 


The empirical results in Tables 2 to 4 show varying performances of different algorithms on 
different data sets. Hence, to rigorously assess the performances of all studied algorithms we carry out the 
nonparametric statistical tests as indorsed in [38]. We perform the Iman-Davenport test with a confidence of 
95% (a=0.05) on all evaluation metrics of above mentioned eight algorithms. It rejects the null hypothesis 
(HO: Ranks of all algorithms are equivalent) for each evaluation metric and infers that at least one of the 
studied algorithms shows better performance than others on each measure. As the empirical result in Figure 1 
claims that the proposed algorithm SPECIAL is the overall best performer, we conduct the pairwise 
Friedman posthoc test with Finner’s correction [38] to statistically analyse whether SPECIAL is the best 
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performer among the other seven state-of-the-art classifiers on each metric. Table 5 summarizes the results of 


the posthoc test with a confidence of 95% (a=0.05). The bold-faced values indicate the noteworthy 
performance improvement of SPECIAL as compared to the other seven classifiers. 


Table 5. Results of pairwise Friedman posthoc test (a=0.05) to compare SPECIAL on all metrics 


Metric HT DWM OzaBagAdwin OzaBoostAdwin ADACC _ HLFR ACDWM 
Accuracy 0.2 0.2 0.9 0.2 0.2 0 0.2 
G-mean 0 0 0 0.1 0 0 0.2 
Fl-measure 0.1 0.4 0.6 0.1 0.1 0 0.1 


4. CONCLUSION 

The proposed algorithm SPECIAL provides a novel joint solution to the challenging problem of 
learning in dynamic data streams with skewness and concept drifts. SPECIAL is a passive drift detection 
ensemble with the smartness of G-mean maximization and ageing-based adaptive learning. It integrates five 
seminal ensembles as its base learners. It forms the smart pools of data mapping to the same area of the 
feature space. These pools point to the local expertise sub ensembles which are likely to give the best 
classification results in that feature space. SPECIAL follows online learning with a test-then-train approach. 
It adapts to the dynamicity in data by employing an ageing-based strategy to forget the historic data and to 
emphasize the recent data. It handles skewness in data streams with the objective of G-mean maximization. 
The performance of SPECIAL is compared with seven state-of-the-art ensembles on three performance 
metrics—i) accuracy, ii) G-mean, and iii) Fl-measure using a variety of benchmark datasets. Based on the 
empirical analysis of these metrics the overall average ranking of SPECIAL indicates that it outperforms the 
other state-of-the-art ensembles in adaptive learning of dynamic data streams. The statistical analysis 
underpins that the proposed online ensemble model shows noteworthy performance improvement. The 
current research presents a passive drift detection model of an ensemble of ensembles. The future study will 
explore the active drift detection model to handle different types of drifts. Also, the presented study assesses 
the performance of the proposed model empirically and statistically. So, in the future, we would like to focus 
on the theoretical performance guarantees of the SPECIAL algorithm. 
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